PARADISE Based Search Engine at TREC 2009 Web Track
نویسندگان
چکیده
In this paper, we introduce the PARADISE search engine in TREC09 Web track. PARADISE is the abbreviation for Platform for Applying, Research and Developing Intelligent Search Engine, which is a search engine platform developed by SEWM group, Peking University. The system is designed to support both English and Chinese information retrieval. This system preprocessed and indexed the five hundred million web pages for this year’s Web Track. In the preprocessing stage, the templates were removed, the encoding were identified and unified, and the anchor texts and InLink information are extracted with the mapreduce framework (using Hadoop in this system). In retrieval, our runs used an extension of BM25. This model distinguishes terms from different fields and integrated both term counts and position information. Furthermore, some web based features are also considered.
منابع مشابه
Overview of the TREC 2009 Blog Track
The Blog track explores the information seeking behaviour in the blogosphere. Thus far, since its inception in 2006 [9], the Blog track addressed two main search tasks based on the analysis of a commercial blog search engine: the opinion-finding task (i.e. “What do people think about X?”) and the blog distillation task (i.e. “Find me a blog with a principal, recurring interest in X.”). In TREC ...
متن کاملMicrosoft Research Asia at the Web Track of TREC 2009
In TREC 2009, we participate in the Web track, and focus on the diversity task. We propose to diversify web search results by first mining subtopics, and then rank results based on mined subtopics. We propose a model to diversify search results by considering both relevance of documents and richness of mined subtopics. Our experimental results show that the model improves diversity of search re...
متن کاملIndri at TREC 2004: Terabyte Track
This paper provides an overview of experiments carried out at the TREC 2004 Terabyte Track using the Indri search engine. Indri is an efficient, effective distributed search engine. Like INQUERY, it is based on the inference network framework and supports structured queries, but unlike INQUERY, it uses language modeling probabilities within the network which allows for added flexibility. We des...
متن کاملUniversity of Padua at TREC 2013: Federated Web Search Track
This paper reports on the participation of the University of Padua to the TREC 2013 Federated Web Search track. The objective was the experimental investigation in Federated Web Search setting of TWF·IRF, which is a recursive weighting scheme for resource selection. The experimental results show that the TWF component, that is peculiar of this scheme, is sufficient to obtain an effective search...
متن کاملDartmouth College at TREC 2007 Legal Track
This report describes Dartmouth College’s approach and results for the 2007 TREC Legal Track. Our original plan was to use the Combination of Expert Opinion (CEO) algorithm [1], to combine the search results from several search engines. However, we did not have enough time to build the index for more than one search engine by the time for submission for official runs. The official results descr...
متن کامل